Windows Server 2008 : Monitoring System Performance (part 2)

3/22/2011 9:15:26 AM

Analyzing Processor Usage

Most often, the processor resource is the first one analyzed when there is a noticeable decrease in system performance. For capacity-analysis purposes, you should monitor two counters: % Processor Time and Interrupts/sec.

The % Processor Time counter indicates the percentage of overall processor utilization. If more than one processor exists on the system, an instance for each one is included along with a total (combined) value counter. If this counter averages a usage rate of 50% or greater for long durations, you should first consult other system counters to identify any processes that might be improperly using the processors or consider upgrading the processor or processors. Generally speaking, consistent utilization in the 50% range doesn’t necessarily adversely affect how the system handles given workloads. When the average processor utilization spills over the 65% or higher range, performance might become intolerable. If you have multiple processors installed in the system, use the % Total Processor Time counter to determine the average usage of all processors.

The Interrupts/sec counter is also a good guide of processor health. It indicates the number of device interrupts that the processor (either hardware or software driven) is handling per second. Like the Page Faults/sec counter mentioned in the section “Monitoring System Memory and Pagefile Usage ,” this counter might display very high numbers (in the thousands) without significantly impacting how the system handles workloads.

Conditions that could indicate a processor bottleneck include the following:

“Average of % Processor Time” is consistently over 60%–70%. In addition, spikes that occur frequently at 90% or greater could also indicate a bottleneck even if the average drops below the 60%–70% mark.
“Maximum of % Processor Time” is consistently over 90%.
“Average of the System Performance Counter; Context Switches/second” is consistently over 20,000.
The “System Performance Counter; Processor Queue Length” is consistently greater than two.

By default, the CPU tab in Resource Monitor, shown in Figure 3 , provides a good high-level view of current processor activity. For more advanced monitoring of processors, use the Performance Monitor snap-in with the counters discussed previously.

Figure 3. CPU section of the Resource Monitor.

Evaluating the Disk Subsystem

Hard disk drives and hard disk controllers are the two main components of the disk subsystem. The two objects that gauge hard disk performance are Physical and Logical Disk. Although the disk subsystem components are becoming more and more powerful, they are often a common bottleneck because their speeds are exponentially slower than other resources. The effects, though, can be minimal and maybe even unnoticeable, depending on the system configuration.

To support the Resource Monitor’s Disk section, the physical and logical disk counters are enabled by default in Windows Server 2008 R2. The Disk section in Resource Monitor, shown in Figure 4 , provides a good high-level view of current physical and logical disk activity (combined). For more advanced monitoring of disk activity, use the Performance Monitor component with the desired counters found in the Physical Disk and Logical Disk sections.

Figure 4. Disk section of the Resource Monitor.

Monitoring with the Physical and Logical Disk objects does come with a small price. Each object requires a little resource overhead when you use them for monitoring. As a result, you might want to keep them disabled unless you are going to use them for monitoring purposes.

So, what specific disk subsystem counters should be monitored? The most informative counters for the disk subsystem are % Disk Time and Avg. Disk Queue Length. The % Disk Time counter monitors the time that the selected physical or logical drive spends servicing read and write requests. The Avg. Disk Queue Length monitors the number of requests not yet serviced on the physical or logical drive. The Avg. Disk Queue length value is an interval average; it is a mathematical representation of the number of delays the drive is experiencing. If the delay is frequently greater than 2, the disks are not equipped to service the workload and delays in performance might occur.

Monitoring the Network Subsystem

The network subsystem is by far one of the most difficult subsystems to monitor because of the many different variables. The number of protocols used in the network, network interface cards, network-based applications, topologies, subnetting, and more play vital roles in the network, but they also add to its complexity when you’re trying to determine bottlenecks. Each network environment has different variables; therefore, the counters that you’ll want to monitor will vary.

The information that you’ll want to gain from monitoring the network pertains to network activity and throughput. You can find this information with the Performance Monitor alone, but it will be difficult at best. Instead, it’s important to use other tools, such as Network Monitor, in conjunction with Performance Monitor to get the best representation of network performance as possible. You might also consider using third-party network analysis tools such as network sniffers to ease monitoring and analysis efforts. Using these tools simultaneously can broaden the scope of monitoring and more accurately depict what is happening on the wire.

Because the TCP/IP suite is the underlying set of protocols for a Windows Server 2008 R2 network subsystem, this discussion of capacity analysis focuses on this protocol.

Note

Windows Server 2008 R2 and Windows 7 deliver enhancement to the existing Quality of Service (QoS) network traffic–shaping solution that is available for XP and Windows Server 2003. QoS uses Group Policy to shape and give priority to network traffic without recoding applications or making major changes to the network. Network traffic can be “shaped” based on the application sending the data, TCP and/or UDP addresses (source and/or destination), TCP or UDP protocols, and the ports used by TCP or UDP or any combination thereof. More information on QoS can be found at Microsoft TechNet: http://technet.microsoft.com/en-us/network/bb530836.aspx.

Several different network performance objects relate to the TCP/IP protocol, including ICMP, IPv4, IPv6, Network Interface, TCPv4, UDPv6, and more. Other counters such as FTP Server and WINS Server are added after these services are installed. Because entire books are dedicated to optimizing TCP/IP, this section focuses on a few important counters that you should monitor for capacity-analysis purposes.

First, examining error counters, such as Network Interface: Packets Received Errors or Packets Outbound Errors, is extremely useful in determining whether traffic is easily traversing the network. The greater the number of errors indicates that packets must be present, causing more network traffic. If a high number of errors are persistent on the network, throughput will suffer. This can be caused by a bad NIC, unreliable links, and so on.

If network throughput appears to be slowing because of excessive traffic, keep a close watch on the traffic being generated from network-based services such as the ones described in Table 2. Figure 5 shows these items being recorded in Performance Monitor.

Table 2. Network-Based Service Counters Used to Monitor Network Traffic
Object	Counter	Description
Network Interface	Current Bandwidth	Displays used bandwidth for the selected network adapter
Server	Bytes Total/sec	Monitors the network traffic generated by the Server service
Redirector	Bytes Total/sec	Processes data bytes received for statistical calculations
NBT Connection	Bytes Total/sec	Monitors the network traffic generated by NetBIOS over TCP connections